{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "fe30f83c09f9d89",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/data_connectors/OxylabsDemo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39207008-8ea1-4a0e-bbe1-a8b5e73f0de0",
   "metadata": {},
   "source": [
    "# Oxylabs Reader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8503fec-8c5f-49cd-acb9-a128b701ddc6",
   "metadata": {},
   "source": [
    "Use Oxylabs Reader to get information from Google Search, Amazon and YouTube.\n",
    "For more information check out the [Oxylabs documentation](https://developers.oxylabs.io/scraper-apis/web-scraper-api)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98850676-106a-484e-85df-eaf6e1ea52d0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%pip install llama-index llama-index-readers-oxylabs"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b6f919780c259fc4",
   "metadata": {},
   "source": [
    "In this notebook, we show how Oxylabs readers can be used to collect information from different sources.\n",
    "\n",
    "Firstly, import one of the Oxylabs readers.\n",
    "\n",
    "Currently available readers are:\n",
    "* OxylabsAmazonSearchReader\n",
    "* OxylabsAmazonPricingReader\n",
    "* OxylabsAmazonProductReader\n",
    "* OxylabsAmazonSellersReader\n",
    "* OxylabsAmazonBestsellersReader\n",
    "* OxylabsAmazonReviewsReader\n",
    "* OxylabsGoogleSearchReader\n",
    "* OxylabsGoogleAdsReader\n",
    "* OxylabsYoutubeTranscriptReader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8f85f8ac372f4ca4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "from llama_index.readers.oxylabs import OxylabsGoogleSearchReader"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b173c175b9a46b0e",
   "metadata": {},
   "source": [
    "Instantiate the reader with your username and password."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2de1828023a372d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "oxylabs_username = os.environ.get(\"OXYLABS_USERNAME\")\n",
    "oxylabs_password = os.environ.get(\"OXYLABS_PASSWORD\")\n",
    "\n",
    "google_search_reader = OxylabsGoogleSearchReader(\n",
    "    oxylabs_username, oxylabs_password\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d144e29d3535f04",
   "metadata": {},
   "source": [
    "Prepare parameters. This example will load the Google Search results for the 'iPhone 16' query with the 'Berlin, Germany' location.\n",
    "\n",
    "Check out the [documentation](https://developers.oxylabs.io/scraper-apis/web-scraper-api) for more examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a7ddb31c9fa2300",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  ORGANIC RESULTS ITEMS:\n",
      "    ORGANIC-ITEM-1:\n",
      "    POS: 1\n",
      "    URL: https://www.apple.com/de/iphone-16/\n",
      "    DESC: Dieses Design verdient ein langes Leben. Das iPhone 16 hat ein Gehäuse aus Aluminium in Raumfahrt-Qualität und durchgefärbtes Glas auf der Rückseite, das extrem ...\n",
      "    TITLE: iPhone 16 und iPhone 16 Plus - Apple (DE)\n",
      "    SITELINKS:\n",
      "      SITELINKS:\n",
      "      EXPANDED ITEMS:\n",
      "        EXPANDED-ITEM-1:\n",
      "        URL: https://www.apple.com/de/shop/buy-iphone/iphone-16-pro\n",
      "        TITLE: iPhone 16 Pro kaufen\n",
      "        EXPANDED-ITEM-2:\n",
      "        URL: https://www.apple.com/de/iphone-16-pro/\n",
      "        TITLE: iPhone 16 Pro\n",
      "  ...\n"
     ]
    }
   ],
   "source": [
    "results = google_search_reader.load_data(\n",
    "    {\"query\": \"Iphone 16\", \"parse\": True, \"geo_location\": \"Berlin, Germany\"}\n",
    ")\n",
    "\n",
    "print(results[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90f61ddd6d0e1d8c",
   "metadata": {},
   "source": [
    "## More examples"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee67b8a09a601367",
   "metadata": {},
   "source": [
    "### Amazon Product"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "78788caec1657f11",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# Products\n",
      "- Item 1:\n",
      "  ## url\n",
      "    https://www.amazon.com/dp/B08D9N7RJ4?th=1&psc=1\n",
      "\n",
      "  ## asin\n",
      "    B08D9N7RJ4\n",
      "\n",
      "  ## page\n",
      "    1\n",
      "\n",
      "  ## brand\n",
      "    Philips Hue\n",
      "...\n"
     ]
    }
   ],
   "source": [
    "from llama_index.readers.oxylabs import OxylabsAmazonProductReader\n",
    "\n",
    "\n",
    "amazon_product_reader = OxylabsAmazonProductReader(\n",
    "    oxylabs_username, oxylabs_password\n",
    ")\n",
    "\n",
    "results = amazon_product_reader.load_data(\n",
    "    {\n",
    "        \"domain\": \"com\",\n",
    "        \"query\": \"B08D9N7RJ4\",\n",
    "        \"parse\": True,\n",
    "        \"context\": [{\"key\": \"autoselect_variant\", \"value\": True}],\n",
    "    }\n",
    ")\n",
    "\n",
    "print(results[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "52bd094e519817",
   "metadata": {},
   "source": [
    "### YouTube Transcript"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f3ede00b701c4665",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "# YouTube video transcripts\n",
      "- Item 1:\n",
      "  - Item 1:\n",
      "    ### transcriptSectionHeaderRenderer\n",
      "      #### startMs\n",
      "        0\n",
      "\n",
      "      #### endMs\n",
      "        25000\n",
      "\n",
      "      #### accessibility\n",
      "        ##### accessibilityData\n",
      "          ###### label\n",
      "            Introduction\n",
      "\n",
      "      #### trackingParams\n",
      "        CAIQ8bsCIhMIntXqp4f6jAMVlSqzAB2-DSWc\n",
      "\n",
      "      #### enableTappableTranscriptHeader\n",
      "        True\n",
      "\n",
      "      #### sectionHeader\n",
      "        ##### sectionHeaderViewModel\n",
      "          ###### headline\n",
      "            ###### content\n",
      "              Introduction\n",
      "  ...\n"
     ]
    }
   ],
   "source": [
    "from llama_index.readers.oxylabs import OxylabsYoutubeTranscriptReader\n",
    "\n",
    "\n",
    "youtube_transcript_reader = OxylabsYoutubeTranscriptReader(\n",
    "    oxylabs_username, oxylabs_password\n",
    ")\n",
    "\n",
    "results = youtube_transcript_reader.load_data(\n",
    "    {\n",
    "        \"query\": \"SLoqvcnwwN4\",\n",
    "        \"context\": [\n",
    "            {\"key\": \"language_code\", \"value\": \"en\"},\n",
    "            {\"key\": \"transcript_origin\", \"value\": \"uploader_provided\"},\n",
    "        ],\n",
    "    }\n",
    ")\n",
    "\n",
    "print(results[0].text)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
